Olist case

Please note: that This is just a quick look of me into datasets. Maybe, I will do more later.

Some first surprises when EDA datasets

  • As I know from previous notebooks and dataset descriptions. Olist is just a ecommerce solution, they just provide solutions for many sellers and buyers to gain the surplus.

  • I am from VietNam, which has many active and competitive marketplace platforms. From my vision in VietNam ecommerce. In basic, The market is divided into 2 main types:

  1. VietNam start-up companies that operate based on the funding by themselves ( You can search Tiki and Sendo to know more their strategies and special operations)

  2. Foreign companies that operate based on the funding from their mother company ( You can search Shopee and Lazada to know more their strategies and special operations)

It is probably not relevant to Olist, but I just want to provide you some other ways that how ecommerce are manipulated strategy to find a new way to survive.

In last year, many ecommerce companies have given up and filed a bankrupcy because of hot competitive marketplace. ** Ecommerce is burning money **

Tiki is a very special ecommerce in VietNam that has a 2h last mile delivery for Tikinow (Tiki has hired some people from Amazon and Facebook). Customers segment of Tiki is mainly located in City. To be honest, I wish I could work in Tiki environment because of their very special strategy and vision.

Sendo is known as VietNam start-up (like as Tiki). They focus on customers located in countryside and then widen to city

In recently, There is some information that Tiki and Sendo are prepared to merge.

Lazada ans Shopee: Lazada is from giant Alibaba in China and Shopee is from giant SEA located in Singapore. They also burn many and many money to gain the marketshare.

Top 1 of the number of suffering is Shopee Top 2 of the number of suffering is Lazada Top 3 of the number of suffering is Tiki Top 4 of the number of suffering is Sendo

Olist model

Unlikely VietNam companies, Olist operates very differently. They just provide solution and platform for their customers (sellers and buyers). I am surprised that the leadtime for shipping product is mainly 12 days. Moreover there is some cases that are 42 days.

           - There is no stock, warehouse
           - There is no shipping product
           - Don't buy products 
           

To know more Olist. I suggest that you should click on Another_notebook. The author mention very well how Olist oprate it model. On the other hand, I learn from this notebook very much.


Preparation: Libraries and Datasets

Here are some techniques that I use to standardize datasets for convenient purposes. Likewise, packages are used up to help me to intepret datasets.


Load libraries

Load datasets

Standardize the datasets


It’s time to have a first look at Olist datasets

Customer

I don’t know why they just store in prefix-zip-code in datasets. Because This is not enough elements to determine each geolocation for each customers. To visualize geolocation, I just calculate the mean of longitude and latitude to have a general view.

           - customer_id
           - customer_unique_id
           - customer_zip_code_prefix
           - customer_city
           - customer_state
           
customer_id customer_unique_id customer_zip_code_prefix customer_city customer_state
06b8999e2fba1a1fbc88172c00ba8bc7 861eff4711a542e4b93843c6dd7febb0 14409 Franca SP
18955e83d337fd6b2def6b18a428ac77 290c77bc529b7ac935b93aa66c333dc3 9790 Sao Bernardo Do Campo SP
4e7b3e00288586ebd08712fdd0374a03 060e732b5b29e8181a18229c7b0b2b5e 1151 Sao Paulo SP
b2b6027bc5c5109e529d4dc6358b12c3 259dac757896d24d7702b9acbbff3f3c 8775 Mogi Das Cruzes SP
4f2d8ab171c80ec8364f7c12e35b23ad 345ecd01c38d18a9036ed96c73b8d066 13056 Campinas SP
879864dab9bc3047522c92c82e1212b8 4c93744516667ad3b8f1fb645a3116a4 89254 Jaragua Do Sul SC
fd826e7cf63160e536e0908c76c3f441 addec96d2e059c80c30fe6871d30d177 4534 Sao Paulo SP
5e274e7a0c3809e14aba7ad5aae0d407 57b2a98a409812fe9618067b6b8ebe4f 35182 Timoteo MG
5adf08e34b2e993982a47070956c5c65 1175e95fb47ddff9de6b2b06188f7e0d 81560 Curitiba PR
4b7139f34592b3a31687243a302fa75b 9afe194fb833f79e300e37e580171f22 30575 Belo Horizonte MG

Geolocation

There are some typo in processing text, but it’s more popular in less frequent geolocation_city

        - geolocation_zip_code_prefix
           - geolocation_lat
           - geolocation_lng
           - geolocation_city
           - geolocation_state
           
           
geolocation_zip_code_prefix geolocation_lat geolocation_lng geolocation_city geolocation_state
1037 -23.54562 -46.63929 Sao Paulo SP
1046 -23.54608 -46.64482 Sao Paulo SP
1046 -23.54613 -46.64295 Sao Paulo SP
1041 -23.54439 -46.63950 Sao Paulo SP
1035 -23.54158 -46.64161 Sao Paulo SP
1012 -23.54776 -46.63536 Sã£O Paulo SP
1047 -23.54627 -46.64123 Sao Paulo SP
1013 -23.54692 -46.63426 Sao Paulo SP
1029 -23.54377 -46.63428 Sao Paulo SP
1011 -23.54764 -46.63603 Sao Paulo SP

Order

There are some typo in processing text, but it’s more popular in less frequent geolocation_city

        - order_id
           - customer_id
           - order_status
           - order_purchase_timestamp
           - order_approved_at
           - order_delivered_carrier_date
           - order_delivered_customer_date
           - order_estimated_delivery_date
           
           
order_id customer_id order_status order_purchase_timestamp order_approved_at order_delivered_carrier_date order_delivered_customer_date order_estimated_delivery_date
e481f51cbdc54678b7cc49136f2d6af7 9ef432eb6251297304e76186b10a928d delivered NA 2017-10-02 11:07:15 2017-10-04 19:55:00 2017-10-10 21:25:13 2017-10-18 00:00:00
53cdb2fc8bc7dce0b6741e2150273451 b0830fb4747a6c6d20dea0b8c802d7ef delivered NA 2018-07-26 03:24:27 2018-07-26 14:31:00 2018-08-07 15:27:45 2018-08-13 00:00:00
47770eb9100c2d0c44946d9cf07ec65d 41ce2a54c0b03bf3443c3d931a367089 delivered NA 2018-08-08 08:55:23 2018-08-08 13:50:00 2018-08-17 18:06:29 2018-09-04 00:00:00
949d5b44dbf5de918fe9c16f97b45f8a f88197465ea7920adcdbec7375364d82 delivered NA 2017-11-18 19:45:59 2017-11-22 13:39:59 2017-12-02 00:28:42 2017-12-15 00:00:00
ad21c59c0840e6cb83a9ceb5573f8159 8ab97904e6daea8866dbdbc4fb7aad2c delivered NA 2018-02-13 22:20:29 2018-02-14 19:46:34 2018-02-16 18:17:02 2018-02-26 00:00:00
a4591c265e18cb1dcee52889e2d8acc3 503740e9ca751ccdda7ba28e9ab8f608 delivered NA 2017-07-09 22:10:13 2017-07-11 14:58:04 2017-07-26 10:57:55 2017-08-01 00:00:00
136cce7faa42fdb2cefd53fdc79a6098 ed0271e0b7da060a393796590e7b737a invoiced NA 2017-04-13 13:25:17 2017-05-09 00:00:00
6514b8ad8028c9f2cc2374ded245783f 9bdf08b4b3b52b5526ff42d37d47f222 delivered 1970-01-01 07:00:30 2017-05-16 13:22:11 2017-05-22 10:07:46 2017-05-26 12:55:51 2017-06-07 00:00:00
76c6e866289321a7c93b82b54852dc33 f54a9f0e6b351c431402b8461ea51999 delivered NA 2017-01-25 02:50:47 2017-01-26 14:16:31 2017-02-02 14:08:10 2017-03-06 00:00:00
e69bfb5eb88e0ed6a785585b27e16dbf 31ad1d1b63eb9962463f764d4e6e0c9d delivered NA 2017-07-29 12:05:32 2017-08-10 19:45:24 2017-08-16 17:14:30 2017-08-23 00:00:00

Item

I don’t understand Is order_item_id like as order quantity?

        - order_id
           - order_item_id
           - product_id
           - seller_id
           - shipping_limit_date
           - price
           - freight_value
           
           
order_id order_item_id product_id seller_id shipping_limit_date price freight_value
00010242fe8c5a6d1ba2dd792cb16214 1 4244733e06e7ecb4970a6e2683c13e61 48436dade18ac8b2bce089ec2a041202 NA 58.90 13.29
00018f77f2f0320c557190d7a144bdd3 1 e5f2d52b802189ee658865ca93d83a8f dd7ddc04e1b6c2c614352b383efe2d36 1970-01-01 07:00:13 239.90 19.93
000229ec398224ef6ca0657da4fc703e 1 c777355d18b72b67abbeef9df44fd0fd 5b51032eddd242adc84c38acab88f23d NA 199.00 17.87
00024acbcdf0a6daa1e931b038114c75 1 7634da152a4610f1595efa32f14722fc 9d7a1d34a5052409006425275ba1c2b4 1970-01-01 07:00:18 12.99 12.79
00042b26cf59d7ce69dfabb4e55b4fd9 1 ac6c3623068f30de03045865e4e10089 df560393f3a51e74553ab94004ba5c87 NA 199.90 18.14
00048cc3ae777c65dbb7d2a0634bc1ea 1 ef92defde845ab8450f9d70c526ef70f 6426d21aca402a131fc0a5d0960a3c90 NA 21.90 12.69
00054e8431b9d7675808bcb819fb4a32 1 8d4f2bb7e93e6710a28f34fa83ee7d28 7040e82f899a04d1b434b795a43b4617 1970-01-01 07:00:31 19.90 11.85
000576fe39319847cbb9d288c5617fa6 1 557d850972a7d6f792fd18ae1400d9b6 5996cddab893a4652a15592fb58ab8db NA 810.00 70.75
0005a1a1728c9d785b8e2b08b904576c 1 310ae3c140ff94b03219ad0adc3c778f a416b6a846a11724393025641d4edd5e NA 145.95 11.65
0005f50442cb953dcd1d21e1fb923495 1 4535b0e1091c278dfd193e5a1d63b39f ba143b05f0110f0dc71ad71b4466ce92 1970-01-01 07:00:56 53.99 11.40

Payment

I don’t understand payment_sequential like as payment quantity?

        - order_id
           - payment_sequential
           - payment_type
           - payment_installments
           - payment_value
           
           
order_id payment_sequential payment_type payment_installments payment_value
b81ef226f3fe1789b1e8b2acac839d17 1 Credit card 8 99.33
a9810da82917af2d9aefd1278f1dcfa0 1 Credit card 1 24.39
25e8ea4e93396b6fa0d3dd708e76c1bd 1 Credit card 1 65.71
ba78997921bbcdc1373bb41e913ab953 1 Credit card 8 107.78
42fdf880ba16b47b59251dd489d4441a 1 Credit card 2 128.45
298fcdf1f73eb413e4d26d01b25bc1cd 1 Credit card 2 96.12
771ee386b001f06208a7419e4fc1bbd7 1 Credit card 1 81.16
3d7239c394a212faae122962df514ac7 1 Credit card 3 51.84
1f78449c87a54faf9e96e88ba1491fa9 1 Credit card 6 341.09
0573b5e23cbd798006520e1d5b4c6714 1 Boleto 1 51.95

Review

I don’t know Portugese, so the review comment is quite difficult for me to understand. But my first think approach is use sentiment

        - review_id
           - order_id
           - review_score
           - review_comment_title
           - review_comment_message
           - review_creation_date
           - review_answer_timestamp
           
           
score comment creation_date answer
4 1970-01-01 07:00:00 2018-01-18 21:46:59
5 1970-01-01 07:00:00 2018-03-11 03:05:13
5 1970-01-01 07:00:00 2018-02-18 14:36:24
5 Recebi bem antes do prazo estipulado. 1970-01-01 07:00:00 2017-04-21 22:02:06
5 Parabéns lojas lannister adorei comprar pela Internet seguro e prático Parabéns a todos feliz Páscoa 1970-01-01 07:00:00 2018-03-02 10:26:53
1 1970-01-01 07:00:00 2018-04-16 00:39:37
5 1970-01-01 07:00:00 2017-07-18 19:30:34
5 1970-01-01 07:00:00 2018-08-14 21:36:06
5 1970-01-01 07:00:00 2017-05-18 12:05:37
4 aparelho eficiente. no site a marca do aparelho esta impresso como 3desinfector e ao chegar esta com outro nome…atualizar com a marca correta uma vez que é o mesmo aparelho 1970-01-01 07:00:00 2018-05-23 16:45:47

Seller

        - seller_city
           - seller_id
           - seller_zip_code_prefix
           - seller_state
           
           
seller_city seller_id seller_zip_code_prefix seller_state
Campinas 3442f8959a84dea7ee197c632cb2df15 13023 SP
Mogi Guacu d1b65fc7debc3361ea86b5f14c68d2e2 13844 SP
Rio De Janeiro ce3ad9de960102d0677a81f5d0bb7b2d 20031 RJ
Sao Paulo c0f3eea2e14555b6faeea3dd58c1b1c3 4195 SP
Braganca Paulista 51a04a8a6bdcb23deccc82b0b80742cf 12914 SP
Rio De Janeiro c240c4061717ac1806ae6ee72be3533b 20920 RJ
Brejao e49c26c3edfa46d227d5121a6b6e4d37 55325 PE
Penapolis 1b938a7ec6ac5061a66a3766e0e75f90 16304 SP
Sao Paulo 768a86e36ad6aae3d03ee3c6433d61df 1529 SP
Curitiba ccc4bbb5f32a6ab2b7066a4130f114e3 80310 PR

Product

        - product_category_name
           - weight
           - quantity
           - height
           - widths
           
product_category_name weight quantity height width
Perfumaria 225 1 10 14
Artes 1000 1 18 20
Esporte_lazer 154 1 9 15
Bebes 371 1 4 26
Utilidades_domesticas 625 4 17 13
Instrumentos_musicais 200 1 5 11
Cool_stuff 18350 4 24 44
Moveis_decoracao 900 2 8 40
Eletrodomesticos 400 1 13 17
Brinquedos 600 1 10 12

Product_Category_Name

        - product_category_name
           - product_category_name_english
           
product_category_name product_category_name_english
beleza_saude Health_beauty
informatica_acessorios Computers_accessories
automotivo Auto
cama_mesa_banho Bed_bath_table
moveis_decoracao Furniture_decor
esporte_lazer Sports_leisure
perfumaria Perfumery
utilidades_domesticas Housewares
telefonia Telephony
relogios_presentes Watches_gifts

Customers and Seller dive deeply

  1. Statistical column in city scope
  2. Statistical column in state scope
  3. Geographical map in city scope
  4. Geographical map in state scope
  5. Geographical payment type
  6. The number of increasing userids and sellers

Customers Perspective

## [1] "Cities"
## [1] "Cities that have equal and more than 100 customers"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"
## [1] "States"
## [1] "There are 27 states in Brazil"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"

## [1] "Duplicated recoreds, Because of the first time in cleaning mistakes, I can't visualize it within city. Likewise, State scope is a potential choice"
## 
## FALSE 
## 99441
## [1] "Yes"


Sellers Perspective

## [1] "Cities"
## [1] "Cities that have equal and more than 100 sellers"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"
## [1] "States"
## [1] "There are 27 states in Brazil"
## [1] "This dataset have some duplicated rows, the long and lat of different cities are the same"